Low-synchronisation Work Stealing under Parallel Data-List Processing in Multicores
نویسندگان
چکیده
In the context of processing data lists in parallel in a multicore, various threads share a workload, each using a list to get and insert the data items to be processed; and when a list becomes empty, the owner thread steals data items from another list — thus balancing the workload according to the processing capacity of each thread transparently to the programmer. The first algorithm we designed to synchronise work stealing in such context addressed various important issues such as cache locality and memory
منابع مشابه
On Lock-Free Work-stealing Iterators for Parallel Data Structures
In modern programming high-level data-structures are an important foundation for most applications. With the rise of multicores, there is a trend of supporting data-parallel collection operations in general purpose programming languages. These operations are highly parametric, incurring abstraction performance penalties. Furthermore, data-parallel operations must scale when applied to irregular...
متن کاملCellCilk: Extending Cilk for Heterogeneous Multicore Platforms
The potential of heterogeneous multicores, like the Cell BE, can only be exploited if the host and the accelerator cores are used in parallel and if the specific features of the cores are considered. Parallel programming, especially when applied to irregular task-parallel problems, is challenging itself. However, heterogeneous multicores add to that complexity due to their memory hierarchy and ...
متن کاملEfficient Resource Oblivious Algorithms for Multicores
We consider the design of efficient algorithms for a multicore computing environment with a global shared memory and p cores, each having a cache of size M , and with data organized in blocks of size B. We characterize the class of ‘Hierarchical Balanced Parallel (HBP)’ multithreaded computations for multicores. HBP computations are similar to the hierarchical divide & conquer algorithms consid...
متن کاملA Work Stealing Scheduler for Parallel Loops on Shared Cache Multicores
Reordering instructions and data layout can bring significant performance improvement for memory bounded applications. Parallelizing such applications requires a careful design of the algorithm in order to keep the locality of the sequential execution. In this paper, we aim at finding a good parallelization of memory bounded applications on multicore that preserves the advantage of a shared cac...
متن کاملEfficient Work Stealing for Portability of Nested Parallelism and Composability of Multithreaded Program
We present performance evaluations of parallel-for loop with work stealing technique. The parallel-for by work stealing transforms the parallel-loop into a form of binary tree by making use of method of divide-and-conquer. Iterations are distributed in the leaves procedures of the binary tree, and the parallel executions are performed by stealing subtrees from the bottom of the tree. The work s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011